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1. Introduction 

In survival analysis and reliability theory, the hazard rate (also known as failure rate) is a natural function 
to model the distribution of data. It describes the probability of instantaneous failure at time x, given 
the subject has functioned until x. The exponential distributions are the only distributions with constant 
hazard rate, which is related to the 'memoryless property' of this distribution. Other shapes of the hazard 
rate indicate whether the object suffers ageing (increasing hazard rate) or is getting more reliable having 
survived longer (decreasing hazard rate). 

In estimating a hazard function under the restriction that it is monotone, popular methods are maximum 
likelihood and isotonic least squares projection ([21], section 7.4). These estimators are typically piecewise 
constant and non-smooth. More recently, the method of monotonic rearrangements was studied in [15]. 
Depending on the choice of the initial estimator, these estimators can be smooth. Methods to obtain smooth 
estimators of the hazard rate include plug-in ratio estimators and smoothed empirical hazards as discussed 
in [20]. See [22] for an overview of the various estimators. These estimators are typically not monotone. In 
[7] the so-called maximum smoothed likelihood estimator was introduced, an estimator that is both smooth 
and monotone. In this paper, a type of non-smooth as well as smooth monotone estimators of a monotone 
hazard rate will be studied. Before giving an outline of the paper, some words on our motivation to study 
this problem. 

The problem of testing a null hypothesis of exponentiality (constant hazard rate) against the alternative 
of a monotone hazard rate, was extensively studied in the sixties of the preceding century; see e.g. [19]. 
Only quite recently, the problem of testing the null hypothesis of monotonicity of a hazard rate has received 
attention. [11] consider a multiscale version of the Proshan-Pyke test, and compute critical values based on 
the exponential distribution. [12] use an integral type test statistic that is based on second order differences 
of the empirical cumulative hazard function and approximate the critical values of the test using bootstrap 
samples from a well chosen smoothed version of the empirical cumulative hazard function. [2] studies the 
supremum distance between two estimators of the cumulative hazard and obtains critical values using the 
exponential distribution. An alternative approach to this testing problem is developed in [9]. There an 
integral-type test statistic is introduced and a bootstrap approach is used to determine approximate critical 
values. This approach is shown to be less conservative than methods based on the exponential distribution and 
less anticonservative than the method proposed in [12]. In order for the bootstrap method described in [9] to 
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work well, estimators for a locally monotone hazard rate are needed that are smooth and uniformly consistent 
on the interval of monotonicity and behave properly near the boundary of the interval of monotonicity. 

In this paper we concentrate on the nonparametric least squares method to estimate a locally monotone 
hazard rate and discuss smooth and non-smooth versions of this approach. It is well-known that the "raw" 
least-squares or maximum likelihood method yields inconsistent estimates at the boundary (this will also 
be seen in section 2). Following an approach introduced in the context of density estimation in [24, 25], 
we introduce a penalty at the endpoints in the least squares criterion. In Theorem 2.1 in section 2 the 
asymptotically optimal penalization constants, minimizing an asymptotic mean squared error criterion, are 
determined. The optimal order of the penalization constants turns out to be vT"^/^, if n is the sample 
size and it is assumed that the hazard is strictly increasing on the interval of interest. Somewhat different 
recommendations were given in [24, 25], where penalization constants of the order (logn)/n and 1/ \/n were 
used, respectively (see also Remark 2.2). 

There are several methods that can be used to construct smooth estimators based on a basic non-smooth 
monotone estimator discussed in section 2. One method that automatically leads to monotone estimators, is 
kernel smoothing. In section 3 this method is described and the resulting estimator is shown to be asymp- 
totically normally distributed. Moreover, both locally and globally optimal bandwidths are determined for 
estimating the hazard rate. 

In section 4 smooth estimates based on penalizing the estimates of section 2 are studied. The penalization 
uses an integral over the square of the derivative of the hazard, as used in [23] and [17]. We show that 
full minimization of the penalized criterion yields a uniformly consistent estimate of the hazard, but gives 
inconsistent estimates of the derivative of the hazard at the boundary points, since the derivatives tend 
to zero at the boundary, as in [23] and [17]. We remedy the latter difficulty by introducing two boundary 
conditions in order to get consistent estimates of the derivative of the hazard, also at the boundary points. 
Having consistent estimates of the derivative of the hazard is important in generating bootstrap samples for 
finding critical values of (isotonic) tests for monotone hazards in the setting of [9]. 



2. Monotone least-squares estimates of the hazard 

Suppose we have a sample Xi, . . . , X„ from a distribution function _Fo on [0, c»), with density /o and hazard 
function /iq. This latter function characterizes the distribution function Fq, which can be seen by the relation 

h^{x)^~^\og{l~F^{x))= ^"^""^ 



dx 1 — Fq{x) 

with inverse ^ 

Fo(a;) = 1 - exp ( - / hoiy) dy 







If one wants to estimate the hazard /iq under the restriction that it is monotone on the interval [0, a], one of 
the simplest estimates is the least squares estimate hn, which minimizes the quadratic criterion 



2 



h{xYdx- / h{x)dRn{x), (2.1) 

J[0,a\ 



under the restriction that h is monotone. Here ]EII„ is the empirical cumulative hazard function 

IHn {x) — — log {1 — F„(a;)},a::< max Xi , 

i 

and F„ is the empirical distribution function of the sample Xi, . . . ,X„. The rationale behind this criterion 
function is that H„ will be close to (defined as ho{y) dy) asymptotically and h i-> ^ h{x)^ dx — 
h{x) dHQ{x) is minimized by taking h = ho (which can be seen by 'completing the square'). Another 
option is to use maximum likelihood methods, but in view of our restriction of the monotonicity hypothesis 
to an interval, this method has more complications in the present case, so we will concentrate on least 
squares methods in this paper. For specificity, we shall consider the hypothesis that h is nondecreasing on 
[0, a], although similar methods can be used if the hypothesis is that h is nonincreasing on [0, a] or monotone 
on a compact interval not including zero. 
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Fig 1. The left panel shows the hazard functions /i'"*) for d = —1, —0.75, —0.50, —0.25 (dashed), d = (full curve) and 
d = 0.25,0.50,0.75, 1 (dotted) corresponding to distribution functions (2.2). The stationary points are shown by the red dots. 
The right panel shows the corresponding densities. 



The solution of the problem of minimizing (2.1) is well-known, and found in the following way. Construct 
the so-called cusum diagram, consisting of the point (0,0), and the points 

(X(,),H„(X(,)-)) , 1 < z < n, < a, (a,Il„(a-)) , 

where the X^^) are the order statistics of the sample, and where we assume X(^n) > ot- Then the solution /i„ 
of the minimization problem is given by the left-continuous derivative of the greatest convex minorant of 
this cusum diagram. 

To illustrate the behavior of the estimators in this paper, we introduce the family of hazards {h^"^^ : d £ 
[— 1, 1]}, also considered in [12]. The corresponding distribution functions on (0,oo) are given by 

F(''\x) = 1 - exp {-\x - I {i (x - f ) V (f)^} - \dx^ + I (1)^} . (2.2) 

If d > we get a strictly increasing hazard; if d < 0, the hazard is decreasing on 

/a 2 , 2 [~ 45~ 3 2,2 A~ 45~\ 

d \ d^ d, d-\ \ d^ d 

I 4 15 15V 4 '4 15 15V 4 i 

and if d = the hazard has a stationary point at a; = 3/4. See Figure 1 for some hazards and corresponding 
densities in this family. 

Remark 2.1. Note that we need the constant | (|)^ in the exponent to make the distribution function zero 
at the left endpoint 0, but that this constant is missing in the formula given below (4.1) on p. 1121 in [12]. 

A picture of the cusum diagram and its greatest convex minorant (red) for a sample of size n — 100 

from the distribution function F^^' on the interval [0, (-F^^-') (0.95)] and the corresponding estimate of the 
hazard function are shown in Figure 2. 

The lemma below shows that on intervals that stay away from the boundary points and a, the hazard 
estimator is uniformly consistent. 

Lemma 2.1. Suppose < a„,/3„ — >■ as n ^ oo. Let Hq be continuous and nondecreasing on [0,a]. Then 
for each < 5 < a/2, 

sup \hn{x) — /io(a;)| — > with probability one. (2-3) 

[S,a-S] 
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Fig 2. The (unpenalized) cusum diagram and its greatest convex minorant (left panel) and the corresponding least squares 
estimate of the hazard (right panel) for a sample of size n = 100 from the distribution function _F(i) on the interval 
[0, (0.95)]. The real hazard is the black curve in the right panel. 



Proof. The argument is similar to that in Theorem 3 in [6] . First note that IHI„ converges to Ho uniformly on 
[0, a] almost surely by the Glivenko Cantelli theorem. Since a.s. for any e > 0, Hq ~ e < Hn < ]HI„ < Hq + e 
on [0; a] for all n sufficiently large (since is the greatest convex minorant of H„ and Hq — e is a.s. a convex 
minorant of IHI„ for n sufficiently large), converges to Hq uniformly on [0, a] almost surely. 

Now fix a; e (0, a). Then for each e > such that {x — e,x + e) C [a, b], we have by definition of hn 

Hn{x) - Hn{x -e) ^ I I . ^ Hnix + c) - Hn{x) 
< hn[x) < . 

e e 

The left hand side converges a.s. to {Hq{x) — Hq{x — e))/e; the right hand side to {Hq{x + e) — i?o(a;))/e. 
Since e was chosen arbitrarily, this shows (by continuity of ho on [0, a]) that hn{x) hQ{x) w.p. 1. Uniform 
convergence on [d,a — S] follows by monotonicity of both hn and ho and continuity of /iq on [0,a]. □ 

It is well-known that this estimate has the undesirable feature of being inconsistent at the boundary points 
and a, and indeed one notices in Figure 2 that the estimate ftn(O) is too low and the estimate /i„(a) is 
too high. In fact, it immediately follows from the representation of hn that hn = on (0, -'^(i)) for all n. To 
remedy a similar problem in the context of maximum likelihood estimation of a monotone density, [24, 25] 
suggest to introduce a penalty at the endpoints. We also use that method in the present situation. 

To this end, we introduce the penalized cusum diagram, consisting of the point (0, 0), and the points 

(X(i),H„(X(,)-) + a„) , < a, (a,H„(a-) + a„-/3„), (2.4) 

where and /3„ are nonnegative penalty parameters. The left derivative of the present cusum diagram 
minimizes the criterion 

h{xfdx- h{x)dMn{x) - anh{0) + I3nh{a), (2.5) 

J[0.a] 

over all nondecreasing functions h on [0,a]. Consistency of the resulting estimator on [6, a — S] is obtained 
by following the proof of Lemma 2.1. This characterization of the estimator also leads to consistency of hn 
at the boundary points and a. Moreover, the optimal order of convergence to zero of the parameters a„ 
and f3n which is important in order to get a feeling for what to do in practice, can be determined. In [24] it 
is suggested to take a related penalty of order (log n)/n and in [25] to take a penalty of order 1/y/n. One 
of the statements in the theorem below is that, under the assumption that ho stays away from zero and is 
strictly increasing on [0,a], the optimal penalty is of order n~^/^. 
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Theorem 2.1. Let ho be nondecreasing on [0,a] with strictly positive and continuous (one-sided) derivatives 
at and a. Let < — 0. Then: 

(i) For each Q < 5 < a, with probability one, for all n sufficiently large 

^ H„(x) + a„ ,f , ^ H„(a) -/3„ -H„(x) 
/i„(0) = mi ana n„(aj — sup . 

xG[O.S] X xe[a-S,a] O- - X 

(ii) The asymptotically MSB optimal rates for the penalization parameters are «„, /3„ ^ 
(Hi) Let W he standard Brownian motion on [0,oo). Taking an = an~'^/^ and bn = (3n~'^^^, 

nV3 (/.„(0) - .0(0)) A inf (mif)^ ^ I ^ (0),^ 



and 



.V3 _ A inf (^^^MM + ^ + lh',ia)t) 



(iv) The asymptotically MSE-optimal choices for the penalization parameters are a„ — an 
/3n~^/3 where a > is the minimizer of 



Emin {^h'a{0)t + {a + Wiho{0)t)} /tf , 



and P > Q is the minimizer of 

Emm{\h'^{a)t + + W{ho{a)t)} /i}^ . 

Proof. We concentrate on the situation at x = with a„ as penahy parameter. The right boundary at x = a 
with penalty parameter /3„ can be deah with similarly. 

(i) Fix ^ > 0. The local assumption on ho near zero implies that x i— > Hq{x)/x is strictly increasing on (0, a]. 
Hence 

inf go(^) ^ H^jS) ^ HoiS/2) ^ ^^j) 
£ce[d\a] x 6 S/2 

For convenience, write H„ = E1I„ + anl(o,oo) ~ /3nl[a,oo)- By the uniform convergence of EI„ to Hq on [0, a], 
X i-> IHI„(a;)/a; converges to a; i— ^ Ho{x)/x uniformly on [S/2, a]. Combined with (2.7), this shows that with 
probability one, for all n sufficiently large 

inf > , implymg /i„(0) = mf . 

a;G[5,o] X 0/2 xe[0,5] X 

(ii) Let (a„) be given and (5„) be a sequence with < (5„ — ?> and 7i(5„ — > oo as ?i ^ oo. Consider the 
localized and centered process 

r 1-^ T— ^ w ^ FT ^o(Uj - K„(tJ 

o„r d.nt d„t 



and note that 



/i„(0)-/io(0) = inf KW- 



For fixed i, 

y„(i) = (nSn)-'^'^^ + + lh',iO)Snt + UKiU - Km^nt (2.8) 

where < ^„ < and Wn{t) is an asymptotically non-degenerate random variable. Moreover, for W 
standard Brownian Motion on [0, oo). 

In T> 
Wr,{t) ^J— (H„((5„0 - HoiSnt)) -^W{ho{0)t) (2.9) 

V 
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in Z)([0,oo)) endowed with the topology of uniform convergence on compacta. Ignoring the (asymptotically 
negligible) last term in (2.8), we see that balancing the two deterministic terms yields i5„ ~ y/o^] taking (5„ 
converging cither faster or slower to zero than this, will lead to a slower rate of convergence of Vn to zero. 
Using this choice, Vn{t) = Op (^n^^^^an^^^^ + (^alJ^^ . This shows that starting off with a„ ^ tt'^I'^ leads 
to the fastest rate of convergence of Vn(i) to zero. 

(iii) We now take (5„ = n"^/"^ and a„ — wnr"^/^ with a > 0. Also using (i) and the local assumption on /lo 
near zero leads for any v > Q to the approximate asymptotic representation 

v}'^K{Q) - = inf n^'^V,,[t) 

= inf (^ + '^ + hKm) 

te(0,i.ni/3] \ t t 2 ' J 

where ignoring the last term in (2.8) is justified because v can be chosen arbitrarily small (0 < ^„ < v). In 
Lemma A.l in the appendix we show that by taking M > sufficiently large and e > sufficiently small, 

n^/^ (/.„(0) - /^o(0)) = ^^inf^^^ + f + \K{i))t^ 

with arbitrarily high probability. Together with (2.9), and the fact that for Brownian Motion on [0, oo) 

,^,(mh^^a^ \ inf rmMO)t)^^^r,,(o)A 

t>o \ t t ^ ' y te(c,M\ \ t t ^ ^ J 

with arbitrarily high probability by taking e > sufficiently small and M > sufficiently large, this leads 
to (2.6). Finally, the optimal asymptotically MSE-optimal value for a in (iv) is obtained by minimizing the 
expectation of the square of the right hand side of (2.6) as a function of a. □ 

Remark 2.2. Theorem 2.1 gives the optimal penalization constants for the case that the hazard is strictly 
increasing on [0,a]. The situation is quite different if, e.g., the hazard is constant on [0,6] for some 6 > 0. 
In view of (2.8), the linear (in t) terms are not present, and in order to make Vn(t) as small as possible, 
5n should not tend to zero and a„ should be chosen of the order So in this case the type of scaling 

used in [25] seems the more natural type of scaling. The limit behavior of the greatest convex minorant 
which one gets in this case (for a = 0) is analyzed in [4], whereas the limit situation for the case that the 
greatest convex minorant corresponds to a strictly convex function (where rt^^/"^ is the natural scale of the 
penalization constants) is analyzed in [5]. 

Remark 2.3. For < x < a, it can be shown, using arguments similar to those used in [3] that under the 
assumption that /iq is continuous and strictly positive at x, 

""'(ij^)"'('"<^'-'»w)^^''- 

where V = argmax^ (W{t) — t^), with W standard two-sided Brownian Motion, has the Chernoff distribution 
([1] and [10]). This asymptotic distribution is the same as that of the MLE of an increasing hazard function 
as given in Theorem 6.1 in [18]. 

Having uniform consistency on arbitrarily large intervals in [0, a] staying away from the boundary and 
consistency at the boundary points, monotonicity can be used to get uniform consistency of hn on the whole 
interval [0,a]. We prove a somewhat stronger (not sharp but easy to prove) uniform rate result, that is 
needed in the proof of Theorem 4.2. 

Corollary 2.1. Under the conditions of Lemma 2.1 and Theorem 2.1, 

sup \hn{x) - ho{x)\ = Op . (2.10) 

[0,o] ^ ^ 



imsart-generic ver. 2009/08/13 file: IncreasingHazard.tex date: January 12, 2013 



p. Groeneboom and G. J ongbloed/ Monotone hazard estimation 



7 




Fig 3. The penalized cusum diagram and its greatest convex minorant (left panel) and the penalized least squares estimate of 
the hazard (right panel) for a sample of size n = 100 from the distribution function F^^"' on the interval [0, (-F^-*^)) ^ (0.95)]. 
The real hazard is the black curve in the right panel. 



Proof. For x £ [n ^/^,a — n ^^^], 



hn{x) < 



(Ha{x + n-^'^)~HQ{x)) 
< 2ni/4 sup \Hn{y) - Ho{y)\ + ho{x) + rT^'^ sup 

[0,a] [0,a] 

= ft,o(2;) + r„ 

where r„ = Opin^^l'^)^ not depending on x. Similarly, hn{x) — ^.0(2;) > —Tn. This leads to the inequality 
sup[„-i/4 ,j_„-i/4] \hn{x) — ho{x)\ < Tn- For x G [0, n^-^/''], we have 

hn{x) < h^{n-^^^) ^ ho{n-^/^) + hoin-^^") ~ ho{x) + ho{x) 
< hoix)+Tn+n~^/'^suph'oiy) 

[0,a] 

and, using Theorem 2.1 part (iii), 

K{x) > K{0) - hoiO) + ho{0) - ho{x) + ho{x) 
= ho{x) + Op(n~i/''^) - n-i/'' sup h'^iy) 

[0,a] 

leading to supjo „~i/4] \hn{x) — ho{x)\ = Op{n~^^^). For [a -— •nT^/'^, a] the result can be derived in the same 
way. Combining the three rate results for the suprema leads to (2.10). □ 

Taking a„ = n~'^/'^ and /3„ = 2n~'^/^ we obtain Figure 3, for the same sample as used in Figure 2, where 
one notices that the value of h{0) has gone up and the value of h{a) has gone down. 

3. Monotone kernel estimates of the hazard 

Now suppose we have an initial (non-smooth) monotone estimate of the hazard hn on [0, a], like the least 
squares estimate of the hazard, obtained by minimizing (2.1), or the penalized least squares isotonic estimator. 
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obtained by minimizing (2.5) under the restiction that h is nondecreasing. One way of constructing a smooth 
estimate of the hazard based on is to use kernel smoothing. A kernel estimate with bandwidth 6 > of 
the hazard is given by: 

K{x) - / Kb(x - v) dHniv) = I K,{x - v) K(y) dy, K,{u) = b-'K{u/b), (3.1) 



where K is a kernel with compact support, like the triweight kernel 

Note that monotonicity of h„ follows from monotonicity of ft,„. This property is not shared by the direct 
kernel estimator for ho that is obtained by taking the empirical cumulative hazard function IHI„ instead of 
Hn in (3.1). Also the kernel estimators considered in [22], which are ratios of kernel estimators of the density 
/o and estimators of the survival function 1 — Fq, are not monotone in general. An alternative representation 
of our kernel estimate is 

hnix) ^ fKbix-y) I dhn{u)dy^ jj Kb{x - y) dy dhn{u) 

Kb{x - y) dydhn(u) = / K{{x - u)/b) dhn(u) 

> u— — oo J y—u Ju— — OQ 

for X Cz [0,a], where 

' ) , u < -1 



J K{w)dw^l j K{w)dw ,ue[-l,l] 



, u> 1. 



K{x) = h'^{x)= / Kb{x-y)dK{y), 



K{u) 

Note that this yields: 



so we also have an estimate of the derivative of the hazard. 

Going in the other direction, we have the following estimate of the cumulative hazard function: 

i?n(x) = b [ JK{{X - z)/b) dK{z), 

where 

'0, w < -1 



We now have: 



JK{u)= I K{v)dv={ j {u~v)K{v)dv, Me[-l,l], 

M, M > 1 



px+b 

H„{x) ^H,,{x-b) + b j M{{x - z)/b) dh„{z) 

Jx-b 

px-\-b p{x — z)/b rx-\-b 

= Hn{x ^ b) + / hn{z) / K{v)dvdz— / Hn{z)Kb{x — z) dz. 

Jx-b J-1 Jx-b 

As the estimate for the density on [0, a], we can take: 

fn{x) = ft,„(a;)exp|-if„(a;)| . (3.2) 
For X € (0, a), we have the following asymptotic result for hn(x). 
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Fig 4. The left picture shows the estimates hn (blue) and hn (red) of the hazard /i^-*-' (of the family {/i''*' : d £ [—1,1]}, 
black) for a sample of size n = 100 on the 95% percentile interval [0, (0.95)]. The middle and right picture show the 

corresponding derivatives of the hazard rates and the corresponding densities. 




Fig 5. From left to right the isotonic estimates h„ (blue) and h„ (red) of the hazard h^ (of the family {h^"*' : d G [—1, 1]}, 
black) for a sample of size n = 100, and the real hazard h^~^^ (black) on the 95% percentile interval [0, (F'"^') ^ (0.95)]; the 
corresponding derivatives of the hazard rates and the corresponding densities, where we compare in the right panel the estimate 
of the density with the density, obtained from the isotonic projection of the underlying hazard h'^^^ . 



Theorem 3.1. Let hn be the kernel estimate oj the hazard function on [0,a], defined by (3.1). Moreover, let 
ho be twice continuously difjerentiable, and let ho and /ig be both strictly positive on [0,a], where ft.g(0) and 
/iQ(a) are defined as right and left derivatives, respectively. Then: 

(i) If we choose a bandwidth 6„ such that n^/^b^ — > G (0, oo), as n ^ oo, we have for each x € (0, a): 

n^/s {hn{x) - ho{x)} A N {^io{i^),al{,,)) , 

where ^ 

= ^j,^h'^{x) [ u^K{u)du, (72(1/) = ^^4^ I K{ufdu. (3.3) 

J vfQ{x) J 

(a) The asymptotically locally optimal bandwidth is given by 

1/5 



\fo{x)h'^{x)mu^K{u)duy j 



The bandwidth, minimizing the asymptotic global least squares criterion 
is given by 

1/5 



(3.4) 



-1/5 \johQ{x)^/fo{x)dxjK{u)^du, 

Vn,giobopt • n ' ^ < — > n ' . (3.5) 

1 j^h'i{xYdx{ju^K{u)duY J 
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Proof, (i): We get: 

Kb,, {x - y) dHniy) = J K^^ {x - y) dH„(y) + j Kb,, {x - y) d (i?„ - H„ ) (y) 
K,Sx-y)dW,,{y)+ j {4.(2/) -H„(y)} ^K' {{x - y) /b,,) dy 
KbSx - y) dUniy) + y J [Hn{x - b^u) ~ H„(a; - 6„u)} K'{u) dy 



fc„ {x - y) rfH„ (y) + Op (n-^/^^ log n 



where we use that 



sup 

i:G[0,a] 



Hnix)~nnix) \ = Op (n-2/3log„^ . 



This result is related to that in [13] for the concave majorant of the empirical distribution based on a sample 
from a concave distribution function. It can be proved along the lines of [16]. Moreover, 



Kb„{x - y) dW^{y) = / K^Sx - y) dHo{y) + / K^Ax - y) d(H„ - Ho) (y) 



Define 



Wniu) = y^n/b^ (H„(x + 6„it) - IHI„(a:;) ~ Hoix + b^u) + Hq{x)) 



^— r f l-¥,,{x + bnu) \ 



log 



1 - fo(x + bnU) 

1 - Fo{x) 



- v/^{-log(l-^ 



¥„{x + bnU)-¥n{x] 



+ log 1 



¥n{x) 

Fq{x + bnu) - Fo{x) 
l-Fo{x) 

Tn{u) 



log 1 



l-Fo(x) 



V \ 1 iT-r^ - :j ETT^ > + Op 1 = bn— Trr\ ^ 

[1-F„(a;) l-Fo(a;)J 1 - Fo{x) 



+ \/n/bn (Tn(u) - tn{u)) 



Fn(x)-fo(x) 
{l~Fo{x)){l-Vr.{x)) 



+ 



, r~rr^ ( \ ^■n{x)-FQ{x) 

+ y/n bntn{u)- ^ . + Op(l) 

= ^Jn b,,— + Op 1 

1 - Fo(a;) 

where the order terms are uniform for u in compact sets. Using that 



^^7^(^„(M) - i„(u)) = y 5-1/2 _ lfo_,,(y)) dy^(F„ - Fo)(y) 

where W is standard two-sided Brownian motion on M, we obtain 



^Hr I \ ^/ h{x) sV ho{x) 



l-Fo(x) 
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Take 6„ = vn and note that 

= n^^^b-^ ( K{u) d (H„ - Ho) {x + fe„u) = v-^''^ [ K {u) dWn[u) 



The asymptotic bias is given by 

n^/^ j Kb{x - y)ho{y) dy - ho{x) = n^l^ J K{u) {ho{x + bnu) - ho{x)} du 

- Ih'l^ixy'^ j u^K{u)du. 

So we obtain 

n2/5 \h,,{x) - ho{x)] A N (mo(j^),^o'(^)) > 

where ^q{v) and cro(j^) are given in (3.3). The last two statements of the theorem follow easily by setting 
the derivative with respect to v equal to zero in, respectively, the local and global criterion. □ 

Pictures for n = 100 of /i„, its derivative /i^ and the density /„ for the corresponding functions at the 
right end of the family (2.2) are shown in Figure 4, where the globally optimal bandwidth for the hazard, 
given in (3.5) is used. The same pictures for the left end of the family (where the hazard is not monotone), 
are shown in Figure 5. 

For purposes of bootstrapping of the test statistics in [9], a crucial feature is that the estimate of the 
derivative of the hazard stays away from zero, also at the boundary points. This behavior can be shown 
under the hypotheses of Theorem 3.1, even at the boundary points. To obtain a consistent estimate of 
the derivative at the boundary points, one could introduce a boundary kernel. For example, near the left 
boundary point one could take: 

K{x) = a{x/h) [ Kbix - y) dK{y) + P{x/h) f ^—^Kb{x - y) dK{y) 



where Ki,{u) = b-^K{u/b), 
and a{x/b) and f5(x/b) are chosen in such a way that, if ?/ G [0, 1], 

/y rv 
K{u)du + l3{y) / u K {u) du = 1 a,nd 

uK{u) du + (i{y) j u^K{u) du = 0, 

and where a{y) — 1, /3(y) = 0, if y > 1. This will indeed lead to consistent estimates of the derivative of 
the hazard at the boundary, but the disadvantage is that the relation between hn and its derivative via 
derivatives and integrals of the kernel, which we used above, is lost. In generating the bootstrap samples in 
[9] , using a kernel estimate of the hazard, boundary kernels were not used for estimating the derivative at the 
boundary, since it did not lead to significantly different results, and destroyed the simple relation between 
the hazard and its derivative via the kernel. 



4. Smooth estimates of the hazard, based on penalization 

Another approach to obtain a smooth monotone estimate of the hazard is that of penalized least squares; 
see e.g. [23] and [17]. Let A > be a penalty parameter and define the smooth penalized local least squares 
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estimator of h on [0, a] as minimizer of 

^x{h)= (^h{x) - hn{x)y dx + \ h' [xf dx (4.1) 

over the set of differentiable functions h on [0, a], where hn is the monotone (on [0,a]) piecewise constant 
estimate that minimizes (2.5) in section 2. Our first lemma gives the minimizer of '^x over the class of smooth 
functions on [0, a] under boundary constraints at and a. 

Lemma 4.1. Let ki, K2 G M. Then the unique minimizer of over all smooth functions on [0, a] such that 
h{0) = Ki and h{a) = K2 exists and is given by 

h{x) = hi{x) + cie-^/^ + c2e-(''-^)/^ (4.2) 

where 

h,{x) = r e-\y-^\'^K{y) dy, (4.3) 



2 

^0 

and ci and C2 are chosen such that h satisfies the imposed boundary constraints. 
Proof. Writing 

pa pa pa 

I{h)= G{x,h,h')dx^ {h{x) ~ h„{x)}^ dx + X h'{xfdx, 
Jo Jo Jo 

we get Euler's differential equation 

Gh — 7-Gh' — 0, 
dx 

we wish to solve under under the boundary conditions h{0) = ni and h{a) = K2. This results in the second 
order integral equation 

h"{x) ^ X'\h{x) ~h„{x)} (4.4) 

with boundary constraints. 

A particular solution to (4.4) is given by (4.3). Adding the solutions to the homogeneous equation multi- 
plied by constants ci and C2 respectively, the unique solution to the boundary value problem is obtained by 
choosing ci and C2 appropriately in (4.2). □ 

Remark 4.1. Observe that hi in (4.3) can be viewed as a kernel- smoothed version of /i„ in the sense of 
section 3, with kernel function K{x) — ^ exp(— |a:|) and bandwidth b — \/\. In particular this shows hi to be 
monotone. Moreover, for A | and ci, C2 bounded as A | 0, defined in (4.2) is merely a boundary-corrected 
version of hi. In that case the asymptotic behavior of h on closed intervals excluding the boundary points 
and a is completely determined by that of hi. 

As an immediate consequence of Lemma 4.1, the minimizer of without boundary restrictions can be 
identified, as well as the minimizer under the natural boundary constraints h{0) — /i„(0) and h{a) ~ hn{a). 
This latter boundary constraints are natural in view of the consistency of hn at and a. 

Corollary 4.1. The unique minimizer h„ of^x over all smooth functions on [0,a] exists is given by (4.2) 
with ci equal to 

f° f hJx) - hiix) + ^/Xh'i{x)] e-'-'/^dx 
ci = '-^ / . (4.5) 

/; (hnix) - hiix) - VXh'iix)) e-(--)/^dx 

C2 - . (4.6) 

VA|l-e-2WAl 

The minimizer hn of under the boundary constraints h{0) — /i„(0) and h{a) — hn{a) is given by (4.2) 
with 

K{0) - hiiO) - {hn{a) - /ii(a)}e^'^/^ 

ci = ci = —7= , (4.7) 

1 _ p— 2a/\/A 



and C2 equal to 
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K{a) - h,{a) - {hn{0) - h,{0)}e-/^ 
^^2 = "^ = l_e-2a/yA • (4.8) 

Proof. The parameters ci and 62 are found by differentiating the criterion evaluated at (4.2) with respect 
to ci and C2. Differentiation w.r.t. ci yields 

{h{x) - K{x) ~ yA/i'(a;)} e^^/^da; = 

and differentiation w.r.t. ci yields 

\h[x) - K(x) + \f\hl{x)^ e-^"-^'^^^ dx = 0, 

where the dependence on c\ and C2 in the equations is implicit via h and h! . From this (4.5) and (4.6) follow. 
To get (4.7) and (4.8), c\ and C2 are chosen in (4.2) to satisfy the imposed boundary constraints. □ 

The major part of the asymptotic behavior of the smoothness-penalized estimators are related to the asymp- 
totics of h\. The lemma below establishes uniform consistency of h\. 

Lemma 4.2. Let h„ be the (possibly boundary-penalized) least squares estimator of section 2, where an, i- 
0. Let hi be defined by (4--3). Then, for A = A„ 4- and (logn)^A 0, we have for all < S < a/2 

sup |/ii(a;)-/io(a;)| =0p(n-i/4) 

[<5,a-<5] 

//, moreover, a„ and /3„ satisfy the conditions of Corollary 2.1, then for x — Q and x — a hi{x) ^hQ{x) 
with probability one. 

Proof. Note that for each x £ [S,a — S] 



\hi{x) - ho{x)\ = iA-1/2 



POO 

\^-y\f^K{y)dy- / e-\^-y\'^ho{x)dy 



00 



< sup |/i„(z)-/io(^)| + iA-i/2/io(a;) ( / +/ ) e'l^-^'l/^dy 

[S,a-5] 



— 00 ^ a 



Cp — x/\/\ poo \ 
/ + / e-l"l dv 

J -00 J(a-x)/V\J 



< sup \hn{z) — ho{z)\ + ho{a) / e ^ — >• 

in probability as n — >■ 00, where the upper bound is uniform in x £ [S,a — S]. Here we use Lemma 2.1. 
Now consider x = 0. We have 

hi{0) = iA-1/2 f e~y'^K{y)dy=\ [ K{y^)e-y dy ^ \ho{Q) 
Jo Jo 

a.s. as n — >■ 00. For x = a the result follows analogously. □ 

In the lemma below, we investigate the asymptotics of the constants ci and C2 in Lemma 4.1 as A = A,i J, 0. 

Lemma 4.3. Let hn be the boundary-penalized least squares estimator of section 2. Let an and /3n be of the 
order n"^/'^. Then, for A J, 0, 

ci = hn{0) - /ii(0) + op(e-'^/^), C2 - K{a) ~ hi{a) + op(e-°/^) 
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Fig 6. The left panel shows the estimates hn (blue), h„ (red) of the hazard h^^^ (of the family {/i^'*' : d £ [—1,1]}^ for a 
sample of size n = 100, and the real density h^^^ (black) on the 95% percentile interval [0, (^F^^^) ^ (0.95)]; the corresponding 
derivatives of the hazard rates and the densities are shown in the middle- and right picture. 



ci = I e^'^'hnix^/X) dx + (4.9) 

a I \/~X px 

e""^ / hn{yVX)e^^''^y^ dydx + op{l) and 

x=0 •JV=0 
a/VI 

C2 = / e^^hn{a- xVX)dx + (4-10) 



a I \f\ nX 

e-"" hn{a~yVX)e~^''^yUydx + op{l). 

x=0 •JV=0 

Consequently, for A J, and under the conditions of Corollary 2.1, ci, Ci — ^ |ft.o(0) and C2, C2 |ft.o(i) 
Proof. For ci and C2 the result immediately follows from (4.7) and (4.8) and Corollary 2.1. For ci note that 



2^ J K{y)e'^-'y^/^^dy^-l K[y)e-'^--y^'^ dy, 



implying (4.9). Using that 



X-^'^h,{x) + h',{x) = ^ r K{y)e^--yy^dy, 



(4.10) follows similarly. The last statements on the convergence in probability of the c^'s use Lemma 4.2. For 
ci, note that the second term in (4.9) can be written as 

e-^^ KiyVX)ey dydx = - K{yVX) (e-f - e^'-^^/^j dy. 

□ 

Pictures for n = 100 of the estimates of /i„, its derivative and the density /„ are shown in Figure 6, 
where A — 0.10. The same pictures for (the boundary-constrained) hn are shown in Figure 7. 

These pictures suggest that h behaves better than h. The following two results confirm this asymptotically. 
Theorem 4.1 shows that /i„ and /i„ both estimate /iq uniformly consistently. Theorem 4.2 states that /i^ does 
estimate /ig consistently on the interval [0, a], whereas h'^^ is inconsistent at the boundaries and a. 
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Fig 7. The same pictures as in Figure 6, but with the boundary constrained h„ instead of h„ 



Theorem 4.1. Let a„,(3n ^ n~'^l'^ , let be the nondecreasing minimizer of (2.5), and let A = A„ — > as 
n — >■ oo. Furthermore, assume that ho is continuously dijjerentiable on (0, a), with finite right and left limits 
at and a, respectively. Then, if hn and hn are the minimizers of Corollary 4-1, we have for each x £ [0,a]: 

sup \hn{x) — h^){x)\ and sup \hn{x) — ho{x)\ 0, n ^ oo, 

[0,a] [0,a] 

Theorem 4.2. Let hnand /i„ be the boundary constrained minimizers of Corollary 4-1- Then, under the 
conditions of Theorem 4-1 o-nd n^/^A„ — >■ oo, we have for each < (5 < a/2 that 



sup \h'^{x) — ft.o(^)l '^"■'^ ^^"P \h'n{^) ~ ^o(^)l 0, n — )■ oo. 

[S,a-S\ [S,a-S\ 



(4.12) 



Moreover, h'„{{)) and h'^{0) h'^ifS) 



Appendix A: Appendix Section 

Lemma A.l. Let Wn be as defined in (2.9) and assume the conditions of Theorem 2.1. Consider the process 

[0, yn^"") 3 t ^ y„(i) = n^/^KW = + J+ '^h'oiUt 

where ^„ G [0, v]. Then, by choosing e > sufficiently small and M > sufficiently large, for all large n 

inf Vr,{t) > 14(1) and inf K(t) > 14(1) (A.l) 

te[0,e] te[Af,iyni/3] 

with probability arbitrarily close to one. This implies that with probability arbitrarily close to one for all large 
n 

inf 14(i)= inf K(t). 

te[o,i'ni/3] te{€,M] 

Proof. From (2.9) it follows that 14(1) Op(l). Also from (2.9), we have that for any e > 0, 

inf W„(t) A inf Wiho{0)t) =^ -^ho[0)e\Z\ 

[0,e] [0,e] 

where Z ^ iV(0, 1). Hence, with probability arbitrarily close to one, 
inf[o,£] Wn{t) > -a/2, implying 

\Vn{t)\ > ^-^ > ^ for i G (0,e], 



proving the first statement in (A.l). 
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P ( inf <c]<y p( ini V,,{t) < c) 



can be made arbitrarily smaU by taking M > sufficiently large. Fix C > and take Af > sufficiently 
large such that for all t 
[0, v] and taking j > M, 



large such that for aU t > M, {t + I) - a/C - \h'Q(Q)t^ < -^^.[((O)*^. Then, using that h'fj{x) > 5/10(0) on 



inf Vn{t) <C^3te + 1] : 14(i) < C 
[ij+i] 

^3te + 1] : W.n{t)<Ct-a~ {Ch'^m^ < ~ICh'„{0)t^ 



,2/3 



,2/3 



where in the last implication we use that | log(l — u) — log(l — v)\ < 2\u — v\ for < u,v < 1/2. Using 
Markov's inequality, we obtain for j > M 



< 



P ( inf Vnit) <c] <P{ sup F„(n-^/3^) - Foin-^^h) 



> 



10 



n4/3^sup[^j+i] |F„(n-i/3t) _ Fn{n-^/H)\ 



< 100 



//3iJsup[^.^-+ij |F„(n-i/3t) _ Fo(n-i/3i)| 



C2/l[,(0)2j4 

By maximal inequality 3.1(ii) in [14], the numerator in this expression is bounded by C'{j + 1), giving 

1/3 

/ \ / \ innr' °° 



[M,i/ni/3] 



C2/i^,(0)^ 



which can be made arbitrarily small by taking M sufficiently large. 
Proof of Theorem 4.1. For x S [0, a], we have for ft,„ either /i„ or /i„, 

|/i„(x) - /io(a^)| = 

(^{a — x) I \f\ 
-2;/yA 

< i ^^"^ \K[x + VAw) - /io(2: + VAv)|e-l"l ciw+ 

J-2:/\/A 

(a— a;) / \f\ 

{hQ{x ^ ^v) - hQ{x))e~\'"\ dv 
+ h^{x) \ / e-l"l - 1 + cie-"/^ + cse-^"-")/^ 

\ J-Xjs/X ) 

= /W+/(2)+/(3) 

First, observe that by Corollary 2.1 and the assumed smoothness of /ig, for n — > oo 

/•OO 

4'' < sup |/i„(y) - h^{y)\ and /t^) < i sup |/i[;(y)|A / ^^e"" d« ^ 0, 



□ 



[0,q] 



[0,a] 
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where the upper bounds do not depend on x. Furthermore, note that 

-xl\f\ poo 



j(3) 



\hQ{x) 



< |ci - i/io(x)|e-^/^ + |C2 - lho{x)\e-^---^/^\ 
Therefore, for < a; < A^/^ < a/2, we have for 

/^^ < \ci-^hoiO) + UMO)-hoix))\ + {\c2\+'Ma))e-^^^^ 



< |ci - i/io(0)| + iA^/^sup \h[,{y)\ + (|C2| + i/io(a))e-'^/2^ ^ 



[0,a] 



by (4.9), where this upper bound is again independent of a: € [0, A^/**]. For x € [a — A^^*, a] a similar argument 
yields an upper bound that does not depend on x and converges to zero in probability. For A^^^ < a: < a—X^^^, 
we have 

IL'^ < (|ci| + i/^o(a))e-V^^^^ + {\c,\ + '^ho{a))e''/'''' = Op{e-'/''''), 

again with an upper bound not depending on x. These inequalities, combined with Lemma 4.3 lead to (4.12). 
□ 

Proof of Theorem 4.2. Using the expression for h'l implicit in (4.11), we get for x E (0, a) 



h[{x) 



1 



2yA \Jo 

Fix < 5 < a/2. Using Corollary 2.1, 



hn{x + y^/ \)e^y dy - / hn{x - yV\)e^''^ dy 



Vn = sup 

xe[0,a] 



hn{x)-ho{x)\^Op , 



(A.2) 



we can write (for 6 < x < a/2; the situation a/2 < x < a — S is similar) 
\h[{x)-h',ix)\ < 



2VA 



+ h'^{x)e-'=/^+^ 



(a — x) I \f\ 



2VA 



/Va 

xj^/X 



h„{x + y\' \)e ^dy 



\/A [0,(5 



<^+sn^h'^{y)e-"^^ 



{y-l)h'^{x)+y^^\snv\K{z)\ e"^ 

[0,a] / 



The first three terms in the upper bound are op(l), uniformly in x, where we use that A does not converge 
to zero too rapidly. The same holds for the last term, since it is bounded by 

/•OO /"OO 

sup/i[,(z) / \y-l\e-y dy + ^/\sn^\h'^{z)\ y^ e'^ dy. 



[0,a] 



[0,a\ 



Also using Lemma 4.3, this proves (4.12). 

Now consider the situation at zero. First for the estimator /i„. Note that, using (4.11) and Lemma 4.3, 



h'{Q) = h',{Q)-^+op{l) = ^^j^ 



-1 

7x 



a/ \/A ^ Y 

e^^hn{xV~\) dx 



Kix)e-''/'^dx + op{l) + 

a/VX 



-1 



2VX Jo 



a/VX 



h„ (xVx) e ^ dx+ 



VX Jx=o Jy=o 

a/\/A 



1 



\/A J x=0 ■jv=o 



hn{yVX)e-^''-y'> dy dx 



(yVX^ e-(^-^) dy 



dx 



+ op(l) = -i/i[,(0) + ^h'oiO) + op(l) = op(l), n ^ cx). 
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where we also use (A. 2) to obtain the last line. So we get: h'^{Q) for n — >■ cx), which means that h' 
is inconsistent at zero. The other boundary point a can be treated in a similar way. Finally, consider the 
behavior of h'^ at zero. Using (4.11) and Lemma 4.3, we get 

h'jO) = h[{0) - \-^/^c + op{l) = {hi{0) - c)/^ + op{l) 

Jo 



-e ^dy + op{l). 



Using (A. 2), note that 



e-y dy ~ h'oiO) 



< 



""/^ ( ho{yVX)-ho{0) 

[ Vx 



h',{0)]e-ydy 



+ ^+Op{l)=Op{l), 



under the assumptions of our theorem. 



□ 
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